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ABSTRACT 


Suppose  {Xn  }  is  a  p  -th  order  autoregressive  process  with  innovations 

in  the  domain  of  attraction  of  a  stable  law  and  the  true  order  p  unknown. 

f 

The  estimate  of  chosen  to  minimize  Akaike’s  Information  Criterion 

over  the  integers  0,1,...,  K .  It  is  shown  that  P  is  weakly  consistent  and 
the  consistency  is  retained  if  AT  -»  «»  as  V  -»  <»  at  a  certain  rate 
depending  on  the  index  of  the  stable  law. 
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0.  Introduction 


Consider  a  stationary  p  -th  order  autoregressive  (AR(p ))  process  {Xn  } : 

where  {e„ }  are  independent,  identically  distributed  (i.i.d.)  random  variables.  The 
parameters  , . . . ,  satisfy  the  usual  stationarity  constraints,  namely  all  zeroes  of  the 
polynomial 

i'  -  £  jji'-' 
im  * 

have  modulus  less  than  1. 

Now  assume  that  the  true  order  p  is  unknown  but  bounded  by  some  finite  constant 
K(N).  Our  main  purpose  here  will  be  to  estimate  p  by  f>  where  p  will  be  obtained  by 
minimizing  a  particular  version  of  Akaike’s  Information  Criterion  (AIC)  (Akaike,  1973)  over 
the  integers  {0,1,...,  K(N)} .  Because  we  should  be  willing  to  examine  a  greater  range 
of  possible  orders  for  our  estimate  as  the  number  of  observations  increases,  it  makes  sense  to 
allow  K  (N)  to  increase  with  N  .  In  the  finite  variance  case  with  K(N)  ■  K  ,  AIC  does  not 
give  a  consistent  estimate  of  p  ;  in  fact,  there  exists  a  nondegenerate  limit  distribution  of  p 
concentrated  on  the  integers  p,  p+ 1 . K  (see  Shibata,  1976). 

It  should  be  noted  that  AIC  is  a  very  general  procedure  which  applies  to  a  variety  of 
statistical  models.  In  general,  for  a  given  statistical  model  Clb  with  k  -dimensional 
parameter  vector  b  ,  AIC  is  defined  as  follows: 


<Kfl*)  =  -2A(b)  +  2* 

where  A (b)  is  the  maximized  log-likelihood  for  the  model  Qb  .  However,  in  the  time 


mi¬ 


series  literature,  AIC  is  usually  defined  in  terms  of  a  Gaussian  likelihood;  so  for  a  fc-th  order 
autoregressive  model,  we  will  define  AIC  as  follows: 

<K*)  »  N  In  b2(k)  +  lk 

A  2 

where  o  (it)  is  the  estimate  >  he  innovations  variance  obtained  from  the  YW  estimating 
equations.  We  will  choose  as  oui  estimate  of  p  the  order  which  minimizes  0(fc)  for  k 


between  0  and  K  ,  that  is. 


p  =  arg  min  <^(k) 
OikiK 


In  the  case  where  two  or  more  orders  achieve  the  minimum,  we  will  take  the  smallest  of 
those  to  be  our  estimate. 

For  certain  reasons,  we  may  also  want  the  autoregressive  parameters  to  vary  (with  N  ) 
over  some  region  of  the  parameter  space.  For  example,  consider  the  following  hypothesis 
testing  problem: 


H0  :Xn  =  e„ 


versus 


Ha  :  Xn  is  a  nondegenerate  autoregressive  process. 

We  can  consider  a  sequence  of  local  alternatives  {  H^N)  }  converging  to  H0  in  the  sense 
that  all  the  AR  parameters  converge  to  zero  and  then  investigate  the  power  of  AIC  as  a 
statistical  test. 

The  set  of  parameters  which  obey  the  stationarity  condition  is  a  complicated  region  in 
Rp  (although  the  closure  of  this  region  is  a  compact  set  in  Rp  ).  However,  it  can  be  shown 
(Bamdorff-Nielsen  and  Schou,  1973)  that  there  exists  a  one-to-one  continuous  mapping 
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between  the  set  of  (3’s  and  the  set  of  the  first  p  partial  autocorrelations 
{(Pi  , . . . ,  Pp)-Pj  e  (-1,1)  for  j  =  1 , . . . ,  p } .  Thus  one  can  parametrize  an  AR(p ) 
process  by  its  p  partial  autocorrelations,  each  of  which  may  vary  freely  in  the  interval 
(-1,1) .  Moreover,  one  can  show  that  for  an  AR(p)  process,  pp  =  .  For  autoregressive 

order  selection,  the  p-parametrization  is  somewhat  more  natural  than  the  (J-parametrization. 
That  is,  the  "distance"  between  two  autoregressive  models  with  different  orders  is  more 
easily  seen  in  the  p-parametrization. 


1.  Infinite  variance  autoregressions 

We  will  be  interested  in  the  case  where  the  innovations  {e„  }  are  in  the  domain  of 
attraction  of  a  stable  law  with  index  a  e  (0,2) .  If  E(  |  e„  | )  <  ••  then  we  will  assume  that 
£(ew)*0. 

Recall  that  given  observations  Xl  XN  and  known  order  p  ,  it  is  possible  to 
consistently  estimate  the  AR  parameters  pt  , . . . ,  f)p  .  In  fact  for  LS  estimates 
betahatsub  1 ,  P,  where  /  £  p  : 

‘4  0  for  5><x 

where  Pt  =0  for  k  >  p  .  For  YW  estimates,  a  slightly  weaker  result  holds:  convergence  to 
0  is  in  probability  rather  than  almost  sure. 

We  may  also  wish  to  consider  AR  models  of  the  form 

XH  -u  *  -^  +  e» 

where  p  is  unknown  and  we  retain  the  same  assumptions  on  the  Pk ’s  and  {ew  } .  It  can  be 
shown  (Knight,  1987)  that  if  we  center  the  observed  series  by  subtracting  the  sample  mean  X 
(i.e.,  X'  =X„  - X  )  and  estimate  p,  , . . . ,  via  the  YW  equations  (using  X'  , 

p 

»  *  l N),  we  will  still  have  N ( Pk  -  P* )  ->  0  for  5  >  max  (1,  a)  and  the 
convergence  is  almost  sure  for  LS  estimates.  More  generally,  we  can  center  die  observed 
series  by  subtracting  any  location  estimate  £  and  estimate  the  P’s  using  the  centered  series. 
Depending  on  the  precise  convergence  properties  of  (1  we  may  be  able  to  obtain  the  full  rate 
of  convergence  for  the  estimates  of  the  AR  parameters  (Knight,  1987). 
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As  stated  earlier,  we  will  want  to  vary  the  autoregressive  parameters  with  N  .  For  this 
reason*  we  will  consider  a  triangular  array  of  random  variables 
yd) 


y(2>  y  (2) 
A.  ,  A2 


X™ 


y  m 

»  Aj  »  *  •  •  *  "V 


where  each  row  is  a  finite  realization  of  an  AR(p )  process: 


W> 


f  »rx?-y> 


+  e 


<tf> 


The  corresponding  triangular  array  of  innovations,  {  e^V)  >  consists  of  row-wise 
independent  random  variables  sampled  from  a  common  distribution  which  is  in  the  domain 
of  attraction  of  a  stable  law.  Given  a  single  Lid.  sequence  {  zn  } ,  we  could  construct  each 
element  of  the  triangular  array  as  follows: 


-  2  «y(F”X,-,  • 

)-0 

We  will  require  that  , . . . ,  )  are  such  that  ( p , . . . ,  p ^  )  are 

contained  in  a  closed  (and  hence  compact)  subset  of  (-1, 1/  for  all  N .  Since 
,  we  can  attempt  to  shrink  P^  to  zero  as  N  goes  to  infinity  and  try  to 
consistently  estimate  p  at  the  same  time.  (In  the  testing  setup  mentioned  earlier,  this  would 
correspond  to  AIC  providing  a  consistent  test  under  a  sequence  of  local  autoregressive 
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altemadves.)  Intuitively,  it  would  seem  that  the  smaller  |  |  is,  the  more  difficult  it 

should  be  to  distinguish  between  a  p-th  order  and  a  lower  order  AR  model.  From 
simulations,  this  does  seem  to  be  the  case.  This  is  the  real  motivation  for  allowing  the 
parameters  to  vary  with  N .  Consider  the  following  example.  Suppose  we  observe  a  p-th 
order  AR  process  which  has  pp  very  close  to  zero  (say  pp  =0.1 ).  To  estimate  the  order 
of  the  process,  we  use  a  procedure  which  we  know  to  be  consistent  So  for  N  large  enough, 
we  will  select  the  true  order  with  arbitrarily  high  probability.  However,  for  moderate  sized 
N  ,  the  probability  of  underestimating  p  may  be  very  high.  Conversely,  if  |  pp  |  is  close 
to  1,  then  even  for  small  N  there  will  be  high  probability  of  selecting  the  true  order.  So  by 
allowing  pp  =  $p  to  shrink  to  zero  with  N  ,  we  may  get  some  idea  of  the  relative  sample 
sizes  needed  to  get  the  same  probability  of  correct  order  selection  for  two  different  sets  of 
AR  parameters.  If  we  view  order  selection  as  a  hypothesis  testing  problem  (say  testing  a  null 
hypothesis  of  white  noise  versus  autoregressive  alternatives),  shrinking  to  zero  is 
similar  in  spirit  to  the  sequence  of  contiguous  alternative  hypotheses  to  a  null  hypothesis 
considered  in  Pitman  efficiency  calculations. 

We  should  note  that  the  partial  autocorrelations  do  not  have  their  usual  finite  variance 
interpretation;  however,  they  can  be  unambiguously  defined  in  terms  of  the  regular 
autocorrelations  which  themselves  can  be  unambiguously  defined  in  terms  of  the  linear 
process  coefficients,  (see  Davis  and  Resnick,  1985)  Moreover,  the  partial  autocorrelations 
can  be  estimated  consistently  by  recursive  YW  estimates  just  as  in  the  finite  variance  case. 

If  we  include  unknown  location,  p. ,  in  the  model,  we  will  assume  that  it  does  not  vary 
with  N  .  To  have  p  vary  with  N  does  not  really  make  a  lot  of  sense  since  it  is,  in  a  sense, 
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a  nuisance  parameter  in  this  situation. 

We  will  provide  an  answer  to  the  following  question:  under  what  conditions  (if  any)  on 
K(N)  and  ( Pf1^  , . . . ,  p^* )  will  AIC  provide  a  consistent  estimate  p  of  p  ?  Bhansali 
(1983)  conjectures  that  AIC  may  provide  a  consistent  estimate  of  the  order  of  an 
autoregressive  process  based  on  the  rapid  convergence  of  parameter  estimates.  However,  he 
seems  to  conclude,  from  Monte  Carlo  results,  that  this  may  not  be  the  case.  If  K  (N)  is 
allowed  to  grow  too  fast  then  we  may  wind  up  severely  overfitting  much  of  the  time;  for 
example,  P  could  equal  K(N)  with  high  probability. 
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2.  Theoretical  Results 

The  main  result  of  this  paper  is  contained  in  Theorem  7;  the  first  six  results  provide  the 
necessary  machinery  for  Theorem  7.  We  begin  by  stating  two  results  dealing  with  r-th 
moments  of  martingales  and  submartingales. 

n 

Theorem  1.  (Esseen  and  von  Bahr,  1965)  Let  ^  Xk  .  If  E(Xn  |  5n  _ , )  =  0  for 
2 £ n  and  Xn  e  Lr  for  1  £ r  £ 2  then 

□  £(|S*f)  S  2  £E(|X„f). 

f*=l 

(Note  that  {Sn  ,  <y(Sn ) ;  n  2. 1}  is  a  martingale.) 

Theorem  2.  (c.f.  Chung, 1974  p.346)  If  {Xn  ,  a(XH ) ;  n  £  1}  is  an  Lr  -submartingale  for 
some  r  >  1  then 

s  |r) - 

The  following  lemma  will  allow  us  to  ignore  the  dependence  on  N  of  the  moments  of 
{X^  }  by  virtue  of  being  able  to  bound  the  moments  over  any  sequence  of  admissible 
parameters  within  a  compact  set 

Lemma  3.  Let  {Xn  (P)}  be  a  stationary  AR(p )  process  with  parameter  P  and  innovations 
{en  }  in  the  domain  of  attraction  of  a  stable  law  with  index  a.  Let  C  be  a  compact  set  of 
the  parameter  space.  Then  for  all  0  <  8  <  a, 

su£e[|X„<»|‘]  <-. 


max  |X_  j 
1  SniN 
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Proof.  Xn  (fi)  =  2  cj  (P)c„  -j  where  cy  (jl)  is  a  continuous  function  of  $  for  all  j  . 

7*0 

Now 

|x„(p)l  S  £  I ^ CP) I  |e..y| 

7=0 

*  2  aj\e*-j  I 

7=0 

where  a,  =  sup  |c,(|5)|  .  However,  it  can  be  shown  that  la,  I  £Cjp  \x  | 1  where 
1  p«c  1  -  1 

09  y 

|  ac  |  <  1  and  so  2  ifl#  l  <  00  for  all  y  >  0 .  Under  this  summability  condition,  it 

7=0 

follows  from  Cline  (1983)  that  the  random  variable 


x  =  £  Oylejl 

7=0 


is  finite  almost  surely  with 


lim 


P[X  > x  ] 
P[  |ei|  >*] 


<  oo 

7=0 


This  implies  that  E (X*)  is  finite  for  all  0  <  5  <  a  and  the  result  follows.  □ 

The  following  lemma  will  allow  us  to  treat  moments  of  the  same  as  the 

moments  of  2  e*  when  a  >  1 . 

Lemma  4.  Let  {X*}  be  a  zero  mean  stationary  AR(p )  process  with  innovations  {en  }  in 
the  domain  of  attraction  of  a  stable  law  with  index  a  >  1 .  Then  for  any  1  <  r  <  a , 


'•V. 


K  i.fc  *  <4  »  *  .*  >  I  *  *  '  1  *  |  1  Jk  i  •  *  %  k  | 


;  .H 


(a)  £  2*«  =0(iV) 

«*1 


(b)  £  E*" 

1SmS*  n=l 


Proof. 


=  0(N). 


N  N  p 

2  en  =  2  x„  -  2  M«-* 

«=1  n=l  .  *=1 


=  1-2  P*  2  x„  +*JV 

*=i  «=i 


where  max  |  {5k  |  p(p  +  l)  max  |Xt  (  .Thus 

1  Skip  1—pSkSN 


N  N 

2X„=C  le„  -Rn 


where  C  =  1  “  £  P*  •  Thus  by  Minkowski’s  Inequality, 


N 

E  2*n 
*=1 


ZCE  2  efl 


+  CE[  |*„f  )llr 


and  part  (a)  follows  from  Theorem  1  by  noting  that  £  [  1**1  1  =  0(N).  (It  can  actually 

be  shown  to  be  o(N)  by  using  a  uniform  integrability  argument) 

Part  (b)  follows  similarly  from  Theorem  2  by  noting  that 


m 

m 

V 

max 

2  *n 

£  C  max  2  £« 

+  C 

S 

ISmiN 

n=l 

ISmS/V  " 
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and  using  Minkowski’s  Inequality.  □ 

The  following  theorem  deals  with  uniform  convergence  of  both  LS  and  YW 
autoregressive  parameter  estimates  in  the  case  where  location  is  known. 

Theorem  5.  Assume  known  location  4.  Let  K(N)  =  0(Ni)  for  5  <  2 ~  a  and  let  ||v|| 
denote  the  Euclidean  norm  of  the  vector  v  .  Then 

fa)  <N  max  |||3(/)-^)||  i  0  where  »0  for  k  >p  . 

p&UKm  - 

(b)  VaT  max  ||p(/)-|5(/)||  U  0. 

1£<£a(N)  —  — • 

Note  that  the  vectors  are  not  fixed  length  but  may  vary  with  N . 

Proof,  (a)  The  style  of  proof  will  mimic  Hannan  and  Kan  ter  (1977).  For  convenience  we 
suppress  the  notation  indicating  the  dependence  of  {XH  } ,  {e„  }  and  J5  on  N  .  For  /  tp 
the  LS  estimating  equations  can  be  reexpressed  as  follows: 

C,  (&/)-£)  =  L* 

where  r,  (/)■  2  e„X  •  Fix  8<-£^2-  and  set  K  =K(N)  =  0(N6).  For  each 
*=/+i  2 

-  2 

l ,  Ct  is  non-negative  definite  and  so  it  suffices  to  show  that  for  some  k  <  ~ , 

j _ K  p 

(i)  max  N  1  II  rT  ||  — ►  0 

pllSK 

(ii)  min  min  N~  v'C,  v  00 
piisK  ||v|H 

where  K  -K{N).  If  (i)  and  (ii)  hold  then  clearly  max  II  (3(/)  —  P II  — *  0. 

piliK  -  - 
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To  prove  (i),  it  suffices  to  show  that 


,  .  /  N 

£  max  Nl~u  T  T  eX„  , 

piliK  fi- \  ,«T+1  ' 


for  some  Y  <  — . 


is  1  *■/♦! 


£  £  jv(l-2«)ir  £ 


/  T  n  T 

max  £  emXH_i  +  £  e.X-., 
is/xx  *  '  nT,  ' 


AT  /  ** 

s  y  jV*1-2**  >  £  y  tHXH_i  +  £ 


>Ti 


N 

Z  ****-> 


+  I  |  2  £  JB  £  eHXH-j 


N 

E 


2  A'"-2*'  (*V/  +  "Wj  +  2  VHn,  WNJ  ) 


If  2y  <  1  then  by  the  so-called  cr-inequality 


VN,j  SE  £  I*,*,-/ 12’  -OffW) 

L«*i 


r, 

l 


i 


I 


C 

8 


I 


ft 


fc 


§ 


ET 

Pal 

[vi 
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Nownotethat  Xn  v  =  £  $jXn_j  v  +en  v .  By  the  triangle  inequality, 

;*1 


N 


N-K  i  xl 


n=K+l 

Now 


1 1/2 

JV 

p 

21 

1/2 

r  n  ,  v/2 

»  £ 

< 

2 

.  —  « 

2  £n-*.v 

J 

L  «=itr+i 

L**i  J 

J 

L  n=AT+l  J 

N 

N~K  £ 

n=K+l 


2  P**«-*., 

*=1 


-il  — 


N 


*  2  P;  2  2  ^n-k,\ 

j— 1  k=l  n=K+l 


-  i 


5  P  -  N  , 

1 4  ar*K  v  4 


p ;  2  2  +  ®-(i) 

>=1  *»1  »=*+! 


It  remains  only  to  show  that  N~K  2  e», v  00  •  If  this  is  true  then 


N £  X„ty  — >  00  since  the  probability  that  this  quantity  stays  bounded  clearly  must  tend 
to  zero. 

r 

K 


N 


N  N 

~K  2  <v  2 


»=*+! 


n«*+l  1 


2  vt  "t"  ^  22  vjvk^n-j^n-k 


1  Sj<kiK 


Now 


N  K  ,  ,  *  w  ,  , 

2  2  vk  el-k  =22  vk  en-k 


n*K+ 1 1 


t=  1  nsAT+l 


a:  ,  *-at  , 
»!>’  S  en 


*=1  ziajf+l 


W-AT  . 

=  z  *1  ■ 

n=K+l 


’A 


% 


I 


1 


a 


1 


a 
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Thus 


_  N  K  ,  ,  p 

N'“  2  'L'k'-l-t  -» 


n*K+l  4*1 


since 


N-K 


N~K  J  ej  i  « 

**k+i 


Thus  we  need  only  show  that 


N~*  £  ZZ  vjvk*n-j**-k  ^  0 

«i~£+l  \Sj<kiK 


Now 


K 

4-1 

N 

,  k 

4-1 

N 

n~k 

Z*k 

Z  Vj 

Z  £n-jeii-k 

S^'ZKI 

Z  |V/I 

Z  en-;  e«-4 

4=2 

7*1 

»*s r+i 

4*2 

7*1 

»=k+l 

Now  take  y  <  o  and  note  that  j  *k .  If  y  <  1  then 


N 

7 

IV 

£ 

Z  e»-ye»i-4 

£  £ 

2  le„->e„-»l7 

*«k+i 

=  0(N) 


If  y£l  then  necessarily  a >  1 .  Thus  S{  »  j  £„_.en_i  ;.s  an  L 7 -martingale  and 

*-k+i 


hence 


N 


2  en-jen-4 
A*k+1 


0(N) 


uniformly  over  j  #  k  by  Theorem  1. 


MaaaMMWMiro^^ 


* 

► 

* 

Y  * 

_  K  k- 1 

N 

E 

AT*  S  Iv*|  £  |v.| 

Z  eH-jen-k 

*»2  )» 1 

• 

n=K+ 1 

- 

=  OiN'^KiN?)  =  o(l) 


since  |  vk  |  £  1  for  all  k  . 


(b)  From  the  definitions  of  <2t  ,  Ct  ,r{  and  rt ,  it  is  easy  to  see  that 


(la) 


tn 


max  max 

1S/SJT  Hi, jit 


\<?,V  J)\ 


K  N  , 

*  Z  X?  +  Z  *n 

n*l  n=V-X+l 


and 


(lb) 


SN 


max  max  |  f,  (i )->*■,  (i )  |  S  TX.2  . 
isis*  Hist  1  11  " 


Thus  using  equations  (la)  and  (lb),  we  have 


(2a) 


N~kTn  =  op(N~l) 


and 


(2b) 


N~kSn  *  op(N'1) 


for  k  <  — .  Now  using  some  elementary  facts  about  vector  and  matrix  norms  and  equations 


(2a)  and  (2b),  we  get 

(3a)  nuu  AT*  11  ^  "C^11  s  =  op(l) 

and 


(3b) 


max  N~k  Hr,  -r, || 

IS  UK 


op(K(N)ll2N~l)  =  op(l/VN  ) 


where  the  matrix  norm  is  that  which  corresponds  to  the  Euclidean  vector  norm. 


Now  from  the  definitions  of  $(/ )  and  (3(/ ) ,  we  get 

n~k <?,  (kD-kn)  =  op{\i<n) 

uniformly  in  /  by  equation  (3b).  Finally  we  must  show  that  the  minimum  eigenvalue  of 
N~K  d,  tends  in  probability  to  infinity  uniformly  in  /  since  ||(N"K  d,  )-l|l  is  (in  the 
case  of  symmetric  positive  definite  matrices)  merely  the  reciprocal  of  this  minimum 
eigenvalue.  Note  that  for  unit  vectors  v 

rKv'<?,v  *  N~K  v' Cf  v  +  AT*  v'(dj  -C/  )  V 

*  N~*v'C,  V  -  lid,  -d,H  — ♦  oo 

uniformly  over  /  and  unit  vectors  v  by  condition  (ii)  of  the  proof  of  part  (a)  of  this 
theorem  and  equation  (3a)  above.  Therefore 

ii  (at*  d,  rMi  i  0 

as  required.  □ 

In  the  case  where  we  have  an  unknown  location  parameter  and  we  estimate  it  with  some 
location  estimate  |I ,  we  can  obtain  the  following  corollary. 

Corollary  6. 

1.  If  (jl-jx)2  *  Op(Ny)  for  y  S min  [[-!>- + 

compact  subsets  of  the  parameter  space,  then  Theorem  5  still  holds.  For  a  >  1 ,  X 
satisfies  this  condition. 

2.  If  aSl  and  |1*X  and  Af(AO  =  0(N8)  for  8  <  -y ,  then  conclusions  (a)  and  (b)  of 


j  ,  0  uniformly  over  all 


Theorem  5  hold. 
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Proof.  1.  (a)  Assume  without  loss  of  generality  that  |i  =  0.  We  can  again  reexpress  the  LS 
estimating  equations  as  follows: 


where  now 


N 


<?t(i  J)  =  2  -liXX,., -|1) 

y»/+l 


and 


Li  U) 


N 

• 

r  p  .  i 

* 

-  I 

eH  +jl 

i 

M 

*«l+l 

L  *-i  J 

d 

** 


1-SP* 

km  l 


*  Ll  + 

By  similar  methods  to  those  used  in  the  proof  of  Theorem  5,  it  is  easy  to  show  that  for  some 


K<T’ 


max  N*~*  |jr**  ||  i  0, 

pi  lit  * 


(The  term  involving  2^n-y  i*  killed  using  Lemma  4.) 
In  addition,  using  the  conditions  on 


Finally,  it  follows  easily  that 


min  min  N~Kv'C,v  — »  oo 
y»S/SJTKv|H  “ 
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(b)  Defining  tn  and  SN  analogously  to  the  proof  of  Theorem  5,  we  again  get  that  for 


some  k  <  — , 
o 


N~kTn  =  o,(ATl) 


-l, 


and 


AT*S«  *  oB  (N~ 1 ) 


>N  -  up 


and  the  rest  of  the  proof  follows  as  in  the  proof  of  Theorem  5. 


2.  Everything  follows  from  the  fact  that  for  any  0<5<a, 


max 

ism 


N 

Z  *» 

««/+! 


=  0(N) 


which  implies  that 


max 
1  SISK 


N 

Z  X„ 

*»*/+ 1 


-  Op(NUi) 


So  by  taking  5  close  to  a  and  k  close  to  -£■ ,  we  get 


max 
1  SISK 


N 

Z  xm 

HMl+l 


and  conclusions  (a)  and  (b)  follow  directly  from  this. 


□ 


Theorem  7.  If  lim  inf  N  \  (3^ )  >  2 p  and  conclusions  (a)  and  (b)  of  Theorem  5  hold 

N-*«m  F 


for  some  K(N)  then 


p 

P  -*  P 


>3 
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Proof.  First  we  note  that  since  P  is  integer-valued,  p  -*  p  is  equivalent  to 
P[P  -P)  — »  1  (as  N  — ►  oo).  From  here  on,  we  will  refer  to  K(N)  as  K  and  to  p£N)  as 
Pk  thus  suppressing  the  dependence  on  N  . 

Moreover  we  will  assume  that  the  observations  XH  are  already  centered;  that  is,  we  have 
subtracted  out  the  location  estimate  )i  (if  we  are  assuming  unknown  location). 

We  now  use  the  fact  that 

o*( k )  =  ^(0) n(l- $?(/))  for  k  Z  1 


where  0^(0)  =  -—  Y  X?  and  ftt(/)  is  the  YW  estimate  of  B*  ( 1  £/  )  in  an 

N  ««i 

AR(/)modeL  Now  P[P  <p]£P[  min  $(£)£4(p)]. 

0Sk<p 

Since 


min  <K*)  2  N  V ln(l  -  $?(/))  +  N  Ino^O) , 

Oik<p  lml 


we  can  write 


P[P<p]  Z  P[ln(l-tf(p))2-2p/Nl 

=  P[C-tf(p))2exp(-2p/N)] 
£  P[Nbl(p)Z2p  ]  . 


However, 


N  $?<p)  =  (<N  |Pp  |  +o  (l))2 


and  so 
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lim  sup  P  [V  $1  (p )  £  2p  ]  =0 

N-*<~  y 

since  lim  inf  V/V  |  |  >  '^2p  .  Thus  P[p  <  p  ]  — ►  0 . 

We  also  have  that 

P[  p  >pj  £/>[$(*)  <$(*-1)  for  some  p  <k  ] 

S/»[n  min  ln(l -  ft? (k))  < -2  1  . 

L  p<kiK  J 


If  the  conclusions  of  Theorem  5  hold,  it  follows  that 


and  hence 


N  max  $?(*)  0 

p<kiX  * 


N  min  ln(i -$?(*))  ^  0. 

p<kSX 

Therefore,  PIP  >p]  -+0. 

p 

Thus  P[P  +p]  -*0  and  so  P[P  “pi-*  1  which  implies  that  p  -»  p  .  □ 

The  "practical"  implication  of  this  theorem  is  that  if  N  is  large,  with  high  probability 
p  will  equal  p  provided  that  |  $p  |  is  not  too  small  with  respect  to  IV.  Or  in  other 
words,  for  fixed  (but  large)  N  ,  the  probability  of  selecting  the  correct  order  decreases  as 
(  $p  (  decreases.  Finite  sample  Monte  Carlo  results  seem  to  bear  this  out 
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3.  Simulation  results 

For  illustrative  purposes,  a  small  simulation  study  was  carried  out  using  four  symmetric 
stable  innovations  distributions  with  a  =  0.5 , 1.2 , 1.9  and  2.0  (the  latter  being  the  normal 
distribution).  The  underlying  processes  were  AR(1)  processes  with  the  AR  parameter 
(5  =  0.1 ,0.5  and  0.9.  The  sample  sizes  considered  were  100  and  900.  For  N  =  100,  the 
maximum  order  K  was  taken  to  be  10  while  for  N  =900,  K  was  taken  to  be  15.  100 


replications  were  made  for  each  of  the  24  possible  arrangements  of  a ,  (5  and  N  .  The  results 
of  the  study  are  given  in  Tables  1  through  8. 


Estimated 

order 

AR  parameter 

0.1 

0.5 

0.9 

0 

0 

0 

0 

1 

93 

95 

91 

2 

0 

0 

0 

3 

0 

0 

1 

4 

0 

0 

0 

5 

0 

0 

0 

6 

0 

0 

0 

7 

4 

0 

0 

S 

1 

5 

2 

9 

0 

0 

1 

10-15 

2 

o 

5 

Table  2:  Frequency  of  selected  order  for  AR(  1 ) 
process.  N  =  900  a  =  0.5 

The  results  are  much  as  expected.  We  can  see  that  for  N  =  100  and  p,  =0.1,  AIC 
underestimates  the  true  order  with  high  probability.  For  N  =  900 »  the  probabilities  of 
selecting  the  true  order  increases  over  those  for  N  *  100 . 


Estimated 

order 

AR  parameter 

0.1 

0.5 

0.9 

0 

70 

0 

0 

1 

15 

86 

86 

2 

7 

7 

4 

3 

3 

3 

3 

4 

1 

1 

3 

5 

0 

1 

1 

6 

0 

0 

0 

7 

2 

0 

2 

8 

2 

1 

1 

9 

0 

1 

0 

0 

0 

0 

Table  3:  Frequency  of  selected  order  for  AR(1) 
process.  iV  =  100  a  =  1.2 


Estimated 

order 

AR  parameter 

0.1 

0.5 

0.9 

0 

5 

0 

0 

1 

78 

74 

75 

2 

7 

14 

9 

3 

2 

2 

7 

4 

2 

5 

2 

5 

1 

1 

2 

6 

1 

1 

0 

7 

3 

1 

3 

8 

0 

0 

0 

9 

0 

0 

1 

10-15 

1 

2 

1 

Table  6:  Frequency  of  selected  order  for  AR(1) 
process.  N  *  900  a  =  1.9 


Estimated 

order 

|  AR  parameter 

0.1 

0.5 

0.9 

0 

63 

0 

0 

1 

25 

75 

75 

2 

4 

3 

12 

3 

1 

6 

2 

4 

0 

7 

5 

5 

2 

2 

2 

6 

2 

4 

3 

7 

1 

0 

0 

8 

1 

1 

1 

9 

0 

2 

0 

10 

1 

0 

0 

Table  7:  Frequency  of  selected  order  for  AR(1) 
process.  N  =  100  Normal  distribution 
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Estimated 

order 

AR  parameter 

0.1 

0.5 

0.9 

0 

0 

0 

0 

1 

83 

79 

80 

2 

3 

3 

11 

3 

4 

4 

3 

4 

4 

6 

0 

5 

2 

3 

3 

6 

0 

0 

0 

7 

0 

4 

0 

8 

4 

1 

1 

9 

0 

0 

2 

10-15 

0 

0 

0 

Table  8:  Frequency  of  selected  order  for  AR(1) 
process.  N  =  900  Normal  distribution 


4.  Comments 


Bhansali  and  Downham  (1977)  propose  a  generalization  of  AIC.  They  propose  to 
*  2 

minimize  <|>  (*)  =1V  In  a  (k  )  +  yk  where  y  e  (0,4) .  It  is  easy  to  see  from  the  proof  of  the 
above  result  that  their  criterion  will  also  lead  to  consistent  estimates  of  p  under  similar 
conditions  on  K(N)  and  .  In  fact,  if  y  =  y(N)  >  0  satisfies  y (N)/N  — >  0,  then  the 
criterion  corresponding  to  $"(k)  =  N  In  c?(k)  +  y(N)k  will  consistently  estimate  p. 
Specifically,  with  known  location,  the  estimate  will  be  consistent  provided 

with  y (N)  bounded  away  from  zero  and  with  the  same  conditions  on  K(N).  With  an 
appropriate  choice  of  y(N) ,  this  criterion  will  also  be  consistent  in  the  finite  variance  case. 
However  if  y(N)  grows  too  quickly  with  N  then  the  criterion  may  seriously  underestimate 
the  true  order  p  in  small  samples  in  both  the  finite  and  infinite  variance  cases.  In  an 
application  such  as  autoregressive  spectral  density  estimation  (assuming  now  finite 
variance),  underestimation  is  more  serious  than  overestimation  since,  if  the  order  is 
underestimated,  the  resulting  spectral  density  estimate  may  be  lacking  important  features 
which  may  indeed  exist 
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